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EXPLICIT  STATE  VECTOR  REPRESENTATION 
FOR  HETEROASSOCIATIVE  MEMORIES 


INTRODUCTION 

A  critical  in  the  development  of  associative  memory  (AM)  models  is  the 

representation  of  information.  Typically,  information  to  be  processed  is  presented  in  the  form 
of  a  fixed-length  vector  of  binary  (two-state)  variables.  Clearly,  the  binary  (0,1)  representation 
is  sufficient;  for  a  variety  of  reasons,  however,  the  bipolar  (  — 1,+1)  scheme  is  more  commonly 
used.  One  reason  in  particular  is  that  it  avoids  normalization  considerations  because  every  n- 
length  bipolar  vector  has  the  same  \/n  magnitude.  This  report  examines  an  alternative  vector 
representation  that  maintains  the  advantages  of  the  bipolar  form  while  leading  to  more  powerful, 
nonlinear  AM  formulations. 

Given  a  set  of  bipolar  vector  pairs  {(ui,  vi)...(un,  vn)},  a  bidirectional  associative  memory  M 
can  be  constructed  as  a  sum  of  outer  products: 

M  =  (1) 

»=i 

This  matrix  is  referred  to  as  being  bidirectionally  associative  [1]  because  for  a  learned  pair 
(u,, vt)  it  has  the  property  that  (u,M)  =  v;  and  (v,MT)  =  u;,  where  the  (•)  function  converts 
the  elements  of  a  vector  to  bipolar  form  (according  to  sign).  In  this  single-pass  model  of  an  AM, 
associations  are  maintained  as  a  matrix  of  first-order  correlations  between  elements  of  the  input 
vectors  and  elements  of  the  output  vectors.  It  is  easy  to  establish  that  such  a  model  can  guarantee 
perfect  recall  only  if  the  condition  u,  ■  =  0,  i  ^  j,  is  satisfied  because  u ,M  can  be  expressed  as 

([1]  Eq.  (17)): 

u ,M  =  u.ufvi  +  ^  u.‘U Jvj.  (2) 

This  orthogonality  constraint  can  be  relaxed  (at  the  possible  expense  of  bidirectional  recall)  to 
simple  linear  independence  by  a  more  powerful  recursive  formulation  using  the  Widrow-Hoff  delta 
rule  [2]: 

M{i)  ~  M(i  -  1)  +  autr(v,  -  (u,M(i  -  1))),  (3) 
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where  a  is  a  weighting  factor  which  determines  the  influence  of  the  residual  error  in  changing  the 
state  of  the  memory.  The  attractive  features  of  this  model  are  that  it  is  provably  convergent  when 
o  is  sufficiently  small  and  that  it  yields  the  LMS  (least  mean  squares)  solution  when  the  input 
vectors  are  linearly  dependent.  Unfortunately,  like  all  first-order  bipolar  correlation  memories,  this 
model  possesses  the  constraining  symmetry  that  if  (u M)  =  v,  then  for  the  complement  of  u,  u, 
(uAf)  =  v.  In  other  words,  first-order  bipolar  AMs  are  incapable  of  storing  distinct  associations 
for  complementary  vectors. 


THE  EXPLICIT  STATE  REPRESENTATION 

Symmetry  under  complementation  in  bipolar  AMs  derives  from  the  fact  that  elemental  correla¬ 
tions  are  computed  as  simple  products  that  are  invariant  under  commutation.  This  symmetry  can 
be  viewed  statistically  as  a  tacit  assumption  that  P(b\a)  =  1  —  P(b\a)  for  all  elements  a  of  a  vector 
and  all  elements  b  of  its  associated  vector.  To  relax  this  assumption,  then,  a  noncommutative 
correlation  operator  is  required  that  maps  each  of  the  four  distinct  elemental  pair  possibilities  to 
a  unique  result.  Fortuitously,  the  vector  outer  product  operator  can  be  employed  for  this  purpose 
by  simply  representing  (0, 1)  states  with  the  two-element  orthonormal  vectors  ([0  1] ,  [1  0]).  Vectors 
whose  states  are  represented  in  this  fashion  are  referred  to  as  being  in  explicit  state  (ES)  form.  For 
example,  the  vector  [0  10  1]  is  represented  in  ES  form  as  [0  1  1  0  0  1  1  0].  Although  ES  form 
is  not  as  compact  as  bipolar,  ES  vectors  can  be  processed  exactly  like  bipolar  vectors  in  the  AM 
model  described  in  Eq.  (3)  (except  that  the  (•)  function  uses  relative  magnitudes  of  state  pairs 
when  coercing  vectors  to  strict  ES  form)  in  order  to  eliminate  the  complementation  symmetry. 
Normalization  issues  are  avoided  since  the  ES  form  of  every  n-length  binary  vector  has  magnitude 
\/n. 


To  contrast  the  bipolar  a. id  ES  forms,  consider  the  result  of  storing  a  mapping  from  training 
examples  in  which  a  complementary  pair  of  vectors  is  associated  with  the  same  vector.  In  the  bipo¬ 
lar  model,  this  mapping  of  anticorrelated  vectors  to  correlated  ones  results  in  complete  destructive 
interference.  Specifically,  the  sum  of  the  two  correlation  matrices  yields  a  zero  matrix.  In  the  ES 
model,  however,  no  interference  results.  The  one-to-many  inverse  mapping  necessarily  produces 
destructive  interference  for  any  model,  but  the  difference  between  the  bipolar  and  ES  formulations 
is  that  in  the  bipolar  model  of  Eq.  (1),  the  noise  equally  affects  the  forward  and  inverse  channels 
(i.e.,  M  and  MT )  while  in  the  ES  model,  these  channels  are  independent.  The  character  of  the 
channel  independence  in  the  ES  representation  is  easily  demonstrated  given  vectors  u  and  v  and 
observing  that: 

urv  0  uTv  =  0, 

uTv  0  uTv  =  0,  (4) 

vTu0vTu  =  0, 
vTu®vru  =  0, 

where  ®  is  the  Hadamard  product  C  =  A®  B  defined  as  ctJ  =  ajjbij. 

An  examination  of  the  complementation  symmetry  reveals  that  the  bipolar  model  in  Eq.  (1) 
is  only  capable  of  learning  invertible  mappings.  (In  fact,  effects  from  anticorrelated  bit  pairs 
introduced  in  the  ES  formulation  make  the  bipolar  form  ofEq.  (1)  better  suited  for  one-pass  learning 
of  invertible  mappings.  However,  with  a  more  sophisticated  retrieval  method,  the  ES  version  can 
surpass  the  bipolar  performance  even  in  this  case.)  The  advantage  of  the  ES  representation  to  Eq. 
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(3)  can  be  demonstrated  by  taking  any  three  of  the  possible  four  2-bit  binary  vectors  (even  the 
zero  vector)  and  noting  that  the  ES  conversion  transforms  the  linearly  dependent  set  to  one  that 
is  linearly  independent.  For  example,  the  linearly  dependent  set  {[0  1],[1  0],[1  1]}  is  transformed 
to  the  linearly  independent  set  {[0  1  1  0],  [10  0  1] ,  [1  0  1  0]}.  A  deeper  analysis  yields  the  following 
theorem: 

Theorem:  The  explicit-state  form  of  Eq.  (3)  converges  to  the  following  nondistributive,  hence 
nonlinear,  affine  transformation: 

y  =  xT  -f  b,  (5) 

where  T  is  a  matrix  and  b  is  a  constant  vector. 

Proof:  Writing  u  as  1  —  u,  and  similarly  for  v,  the  ES  forms  of  binary  vectors  u  and  v  can 
be  partitioned  as  [u|(l  —  u)]  and  [v|(l  —  v)],  respectively,  simply  by  separating  the  even  and  odd 
elements.  The  transformation  then  takes  the  form: 


Partitioning  M  as 


[u|(l  —  u)]M  =  [v|(l  —  v)]. 


(6) 


M  = 


A  B 
C  D 


yields  the  following  bipolar  expression  for  v: 


(7) 


V (bipolar)  =  llA  +  (1  -  u)C  -  uB  -  (1  -  u)£>, 


(8) 


which  simplifies  to 


^ (bipolar)  =  U (  A  -  B  -  C  +  D)  +  1  (C  -  D). 


(9) 


Letting  T  =  (A  —  B  -  C  +  D)  and  b  =  1  (C  -  D)  completes  the  proof. 

One  important  consequence  of  this  result  is  that  it  assures  that  any  mapping  of  a  linearly  in¬ 
dependent  set  of  input  vectors  can  be  learned  even  in  the  presence  of  constant  additive  noise.  This 
is  surprising  because  an  additive  noise  vector  always  exists  that  transforms  a  linearly  independent 
set  of  vectors  to  one  that  is  linearly  dependent.  (A  trivial  case  is  the  addition  of  a  vector  z  to 
each  vector  in  a  linearly  independent  set  5  when  — z  €  5.)  In  other  words,  the  ES  representation 
somewhat  relaxes  the  linear  independence  condition  required  for  perfect  learning  in  Eq.  (3).  This 
can  also  be  accomplished  in  the  bipolar  model  by  appending  a  constant  T’  to  each  vector.  The 
advantage  of  the  ES  representation  is  that  recall  from  an  incomplete  vector  (i.e.,  containing  [0  0] 
states)  results  in  a  transformation  that  subtracts  the  contributions  of  the  missing  vector  positions 
from  both  T  and  b.  In  the  bipolar  model,  however,  the  additive  component  of  the  affine  trans¬ 
formation  is  unaffected  by  zero  entries  in  the  input  vector.  Thus,  the  ES  representation  generally 
should  have  superior  recall  performance  from  incomplete  inputs  than  the  bipolar  representation. 

In  addition,  the  ES  representation  supports  probabilistic  measurements  with  confidence  factors. 
Specifically,  a  state  [a  (3]  can  represent  knowledge  that  an  event  is  true  with  probability  a  and  is 
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false  with  probability  0  where  a  +  0  <  1.  For  example,  if  a  measurement  process  with  confidence 
q  ( i  e. ,  is  accurate  to  within  some  acceptable  tolerance  bounds  with  probability  q)  suggests  that  a 
given  event  is  true  with  probability  p,  this  information  could  be  encoded  as  [ pq  (1  -p)q}-  Thus,  the 
results  of  a  measurement  having  zero  confidence  would  be  treated  by  the  recall  process  as  though 
no  information  about  the  event  were  available.  This  is  not  equivalent  to  encoding  the  state  r.s 
[0.5  0.5]. 

HIGHER  ORDER  ASSOCIATIVE  MEMORIES 

The  definition  of  an  AM  network  given  in  Eq.  (3)  uses  only  first-order  correlations  in  learning  a 
heteroassociative  mapping.  It  is  well  known,  however,  that  many  important  mappings  (e.g.,  XOR) 
require  the  use  of  higher  order  correlations.  Fortunately,  this  can  be  achieved  without  altering  the 
AM  model  simply  by  transforming  the  vectors  so  that  the  higher  order  autocorrelation  information 
becomes  explicit  in  the  first-order.  For  example,  a  vector  u  can  be  transformed  to  a  vector  x  so 
that  the  second-order  information  in  u  is  first-order  explicit  in  x  as  follows: 

x  =  [uiu2  |  ...  |  11,11;  |  ...  ]  i  <  j.  (10) 

In  this  case  the  transformation  is  equivalent  to  collecting  the  upper  triangular  elements  of  the 
autocorrelation  matrix  u^u  as  a  single  vector  x.  The  generalization  to  Ath-order  is  straightforward: 

x  =  [u!U2...Ufc  |  ...  |  u,'j u,2 ...u1|t  |  ...  ]  ii  <  i2  <  ...  <  ik-  (11) 

In  the  bipolar  representation  it  should  be  apparent  that  the  A-term  products  can  provide  only 
parity  information,  i.e.,  a  given  product  will  be  negative  if  and  only  if  an  odd  number  of  terms  are 
negative,  otherwise  it  will  be  positive.  Thus,  for  correlations  of  order  >  2,  it  is  doubtful  that  the 
information  added  to  the  bipolar  representation  would  be  of  significant  practical  value.  The  direct 
application  of  this  unfolding  process  to  ES  vectors  is  similarly  inadequate;  however,  this  can  be 
remedied  by  generalizing  the  ES  representation. 

The  motivation  for  the  ES  representation  was  the  elimination  of  symmetries;  however,  because 
symmetry  operations  always  imply  information  loss  (in  the  form  of  irreversibility),  the  principle 
behind  the  ES  representation  can  be  viewed  as  one  of  information  maximization.  Specifically,  no 
first-order  information  is  lost  in  the  vector  correlation  process  under  the  ES  model.  Thus,  the  gen¬ 
eralized  ES  model  should  perform  likewise  for  higher  order  correlations.  This  can  be  accomplished 
by  representing  Ath-order  states  with  a  set  of  orthonormal  2^-length  vectors  such  that  any  ordered 
A-element  subset  of  a  vector  maps  to  a  unique  correlation  state  (in  direct  analog  to  the  first-order 
extension  of  binary  to  ES  form).  Because  the  mapping  is  reversible,  no  loss  of  information  occurs. 
Unfortunately,  the  transformation  of  a  vector  of  length  n  to  Ath-order  ES  form  results  in  a  vector 

2fc,  where  the  first  factor  is  the  binomial  coefficient  giving  the  number  of  A-element 

subsets  of  an  n-element  set.  Thus,  the  practical  use  of  very-high-order  correlations  is  severely  lim¬ 
ited.  However,  third-  and  fourth-order  correlations  for  vectors  having  25  to  50  elements  are  within 
the  realm  of  feasibility  for  several  currently  available  massively  parallel  computers  and  vectors 
having  more  than  500  to  1000  elements  can  be  processed  by  using  second-order  correlations. 

The  computational  complexity  associated  with  the  use  of  high-order  correlation  information 
can  be  often  substantially  reduced  by  eliminating  redundancy  in  the  raw  vectors.  For  example, 
in  many  practical  applications,  training  vectors  are  generated  by  measuring  a  large  number  of 
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variables  or  parameters  without  regard  to  (or,  possibly,  without  the  ability  to  determine)  their 
statistical  independence  or  information  content.  Simply  by  transforming  the  raw  data  by  using 
a  Karhunen-Loeve  Transform  (KLT),  or  some  similar  transform  that  can  be  used  to  maximize 
information  on  limited  channels  [3,4],  the  effective  lengths  of  the  vectors  may  be  dramatically 
reduced.  Thus,  the  order  of  correlation  that  can  be  practically  used  may  be  increased. 


SUMMARY 

In  summary,  it  has  been  shown  that  the  commutative  correlation  operator  used  in  most  linear 
associative  memory  (AM)  models  enforces  symmetries  that  preclude  the  learning  of  several  classes 
of  important  mapping  functions.  It  has  also  been  shown,  however,  that  these  symmetries  can  be 
eliminated  simply  by  using  a  different  information  representation  scheme.  The  explicit  state  (ES) 
representation  has  been  proposed  as  an  alternative  to  the  commonly  used  bipolar  form  and  has  been 
demonstrated  to  permit  standard  AM  architectures  to  learn  nonlinear  mappings.  In  particular, 
it  has  been  shown  that  the  ES  representation  permits  first-order  AM  models  to  learn  nonlinear 
transformations  of  the  form  y  =  xT  -f  b.  This  characterization  is  important  because  its  properties 
are  directly  amenable  to  analysis  by  using  the  known  properties  of  affine  transformations.  For 
example,  it  has  been  noted  that  this  transformation  renders  the  ES  formulation  immune  to  the 
effects  of  constant  additive  noise.  It  has  also  been  shown  that  the  ES  representation  can  be  easily 
generalized  for  the  use  of  higher  order  correlation  information.  /"  ■  •'  / 
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